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Abstract 

^ ^ Quantum key distribution (QKD) promises unconditionally secure key generation between two 

^ ■ distant parties by wisely exploiting properties of quantum mechanics. In QKD, experimental mea- 



surements on quantum states are transformed to a secret key and this has to be done in accordance 
with a security proof. Unfortunately, many theoretical proofs are not readily implementable in ex- 
periments and do not consider all practical issues. Therefore, in order to bridge this "practical 
gap", we integrate a few existing theoretical results together with new developments, in effect 
producing a simple and complete recipe for classical post-processing that one can follow to derive 



o 

I a secret key from the measurement outcomes in an actual QKD experiment. This integration is 



non-trivial and our consideration is both practical and comprehensive in the sense that we take 
into account the finiteness of the key length and consider the effects on security of several essential 
primitives (including authentication, error handling, and privacy amplification). Furthermore, we 
quantify the security of the final secret key that is universally composable. We show that the 
finite-size effect mainly comes from phase error estimation. Our result is applicable to the BB84 
protocol with a single or entangled photon source. 
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I. INTRODUCTION 

Quantum key distribution (QKD) [l, 2] allows two distant users to generate a secret key 
that is guaranteed to be unconditionally secure by the laws of quantum mechanics. Initial 
work on QKD has been focused on the investigation of its unconditional security and a few 
QKD protocols, such as the well-known BB84 protocol have been proven to be secure 
in the last decade [s], 0, jsl. Since then, many QKD experiments have been performed (see, 
e.g., references in Refs. [g, Q]). In general, a QKD experiment involves a quantum state 
transmission step (where quantum states are transmitted and measured) and a classical 
post-processing step (where the measurement outcomes are processed classically with the 
help of classical communication to generate a final secret key). Although standard security 
proofs (such as Ref. \^) imply a procedure for distilling a final secret key from measurement 
outcomes, direct application to an actual QKD experiment is unfit. This is because many 
of these security proofs focus on the case that the key is arbitrarily long, which does not 
hold in practice. It is precisely this finite-size effect that leads to reduced confidence in the 
security of the final key (mediated by the uncertainty in the post-processing tasks such as 
error rate estimation and error correction). Therefore, it is imperative to quantify the finite- 
size effect and to provide a precise post-processing recipe that one can follow for distilling 
final secret keys with quantified security in real QKD experiments. This is the purpose of 
this paper. We note that, recently, lots of efforts have spent on the finite-key effect in QKD 
post-processing, such as Refs. [sl,^]. 

When the key size is finite, inference on error rates and error correction can no longer be 
perfect as they do in the infinite-size case. More specifically, the inferred error rates could 
be different from the true values and there could be leftover errors after error correction. 
Consequently, a finite-length secret key generated by a QKD system cannot be perfect in the 
sense that Alice and Bob do not share the same key and/or Eve possesses some information 
about the key. Nevertheless, the fact that the key is imperfect does not preclude it from 
being used in a subsequent task requiring a perfect key. In fact, if one can assign a probability 
that the key can be regarded as an ideal one, the use of the nonideal key as an ideal one is 



justified. 
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ndeed, this notion of security is captured by the composable security definition of 



which is widely adopted in the field. QKD is composable in the sense that the 



final key generated is indistinguishable from an ideal secret key except with a small failure 
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probability. Thus, the QKD key can be used for any subsequent cryptographic apphcation 
(for instance, later rounds of QKD) requiring a perfect secret key, and the total failure 
probability is the sum of that of the individual composable cryptographic components. In 
QKD, Alice and Bob may run a QKD system many rounds. They share a certain amount 
of secure key prior to each round, which can be used in the data post-processing step. The 
key generated by one round could be used for the next round. Composabiblity requires 
the key generated by all the rounds of the QKD system to be secure. In other words, an 
eavesdropper. Eve, knows limited amount of information about the key (if there is any) even 
after attacking all the rounds. 

In this paper, a security definition with a failure probability (or confidence interval) is 
used. Our result quantifies the security of the final key generated in a QKD experiment with 
a failure probability, i.e., except with this probability the final key can be treated as an ideal 
secret key (identical and private). This is a natural security definition for experiments and 
the aforementioned composability requirement 10|, lUl] is fulfilled. For instance, Alice and 
Bob run a QKD system 10® times and keep the failure probability under e for each round. 
Then the total failure probability is no larger than lO^e. As long as they keep e well below 
10^®, the key generated in this million rounds is secure. The value of e is determined by the 
usage of the key in a real application. Note that we use probability, which is more meaningful 
for experiments, instead of the trace distance [9|, to quantify the security. Throughout the 
paper, e's with various footnotes stand for various failure probabilities. 

Let us start by examining the underlying assumptions and definitions used here. We 
emphasize that in order to apply the scheme to a QKD system, one needs to compare these 
assumptions with the real setup. The assumptions used in the paper are listed as follows: 

1. Alice and Bob perform the BB84 protocol with a perfect single photon source or a 



basis-independent photon source 



2. The detection system is compatible with the squashing model Q, [isl, Q, i.e., the 
input to Bob's system is assumed to be a qubit. For example, detection efficiency 
mismatch is not considered in this paper 

3. Alice and Bob use perfect random number generators. 

4. Alice and Bob use perfect key management. They share a certain amount of secure 
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key prior to running their QKD system. 
The post-processing scheme is based on a modified Shor-Prekill's security proof [5[, which 



is essentially Koashi's complimentary argument [18|. In this approach, the secure key gen- 
eration is equivalent to an entanglement distillation protocol, which involves bit and phase 
error correction. In the post-processing, the bit error correction becomes classical error 
correction and the phase error correction becomes privacy amplification. We remark that 
our result is applicable to any physical QKD implementations that comply with the above 
assumptions, and it does not depend on the implementation details. 

The motivation of this paper is to give a guideline for QKD data post-processing. We 
start from raw data from measurements and some pre-shared secure key bits, and produce 
a secret key with a quantified security definition. This can be a stepping stone for a QKD 



standard. In this paper, we only present t 



le results but not the technical details of the 



• 3- 



derivations, which will be presented in Ref. 

The finite key analysis is important not only from a theoretical point view, but also for 
experiments. For example, the efficient BB84 [20f] is proposed to increase the key generation 
rate. The optimal bias between the two bases, X and Z, approaches 1 in the long key limit 
[2^. In order to choose an optimal bias in the finite key case, Alice and Bob need to consider 
statistical fluctuations. We remark that the proposed post-processing scheme ties up a few 
existing results with some new developments. Note that this integration is non-trivial and 
our contributions are as follows: 

1. A security definition with a failure probability is used. 

2. A strict bound for the phase error estimation is derived. 

3. An authentication scheme is applied for the error verification. 

4. The efficiency of the privacy amplification is investigated. 

5. The parameter optimization is studied. 

II. POST-PROCESSING PROCEDURE 

Classical communication is assumed to be free in many security analyses of QKD. In prac- 
tice, heavy classical communication may lead to a low key generation speed, especially for 
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high-speed QKD setups. Moreover, some classical communication need to be authenticated 
(or even encrypted) in the post-processing, which means that it is not entirely free. Here, 
we study which part of the classical communication need to be authenticated or encrypted. 



For the authentication part, we rely on the LFSR-based Toeplitz matrix construction 21|. 

The secure key used in the post-processing comes from a pre-shared secure key between 
Alice and Bob. For each step, we investigate the secure-key cost, kxx, and the failure 
probability, Exx- 

The post-processing procedure is listed as follows. Note that none of following classical 
communication is encrypted unless otherwise stated. 

1. Key sift [not authenticated]: Bob discards no-click events and obtains n-bit raw key 



by randomly assigning 



22] the double chcks 



261 



might be applied as well, see for example, Ref. [23 1 



N'ote that other key sift procedures 



2. Basis sift [authenticated]: Alice and Bob send each other n-hit basis information. Due 
to the symmetry, we can assume they pick up the same failure probability for this 



procedure 



2l| 



£bs 



n2-^''^+^ (1) 



Here, Alice and Bob use a 2ki,s-hit secure key to construct a Toeplitz matrix with a size 
of {n X khs) by a LFSR. The authenticated tag is generated by multiplying the matrix 
and the message. Then they encrypt the two tags by two kbs-hit secure keys. Since 
the tags are encrypted by a one-time pad, the 2/cfes-bit key used for the Toeplitz matrix 



construction is still private 



21| . Hence, the total secure-key cost in this step is 2kbs 



and the correspondin g fa ilure probability is 2ebs- Note that when Alice and Bob use 



a biased basis choice [2(|, they can exchange less than n-bit classical information for 
basis sift by data compression. Since the secure-key cost only logarithmically depends 
on the length of the message, we simply use n for the following discussion. In the end 
of this step, Alice and Bob obtain Ux {nz)-hit sifted key in X {Z) basis. Define the 
biased ratio to be Qx = nx/{nx + n^). 

rn 

3. Error correction [not authenticated but encrypted [27|]]: the secure-key cost is given 

by 

kec = nxf{ebx)H2{etx) + nJ{ebz)H2{ebz) (2) 
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where f{x) is the error correction efficiency and H2{x) = — a;log2(x) — (1— x) log2(l— x) 
is the binary entropy function. In practice, Ahce and Bob only need to count the 
amount of classical communication used in the error correction. That is, the value of 
kec can be directly obtained from the post-processing. After the error correction, Alice 
and Bob count the number of errors in X (Z) basis: Cbx^x {ebzf^z)- 

4. Error verification: Alice and Bob want to make sure (with a high probability) that 
their keys after the error correction step are identical. Note that the idea of using 
error verification to replace error testing is proposed by Liitkenhaus 22 1. 

Comparing two procedures, authentication and error verification, one can see their 
commonness. In order to show the link between the two procedures, we break down 
the authentication procedure into two parts: Alice sends to Bob the message ffist and 
then the tag. Let us take a look at the stage that Bob just received the message sent 
but before the tag. Now, Alice and Bob each have a bit string. In authentication, 
Alice sends a tag (depending on her message) to Bob and Bob verify it. The claim of a 
secure authentication scheme is that if the tag pass through Bob's test, the probability 
that Alice and Bob share the same string is high. This can also be regarded as an 
error verification procedure. Hence, secure authentication schemes can be used for the 
error verification. 

Note that the only difference between the two procedures is that in general, an authen- 
tication scheme does not care whether the tag reveals information about the message 
or not, but error verification does (at least for our use in QKD post-processing). This 
difference can be easily overcome by encrypting the tag, which has already been done 
in some authentication schemes including the one we are using. 

Thus, in this procedure, Alice sends an encrypted tag of an authentication scheme to 
Bob. The cost for this step, kev, similar to Eq. ([T]), is 

£e. = (ri. + n,)2-'=-+^ (3) 

We remark that when Alice and Bob failed the error verification, they can go back to 
the error correction step. 

5. Phase error rate estimation [no communication]: random sampling. Alice and Bob 
can estimate the phase error rates in X and Z bases, Cp^ and separately. Take Z 



basis for example. The probability of Cpz > Cbx + Ox is Pqx [19| 



where the ^^(6'^) is defined by = H2{ebx + 0x-qx0x)-qxH2{ebx)-{^-qx)H2{ehx + 

9x) with g^. = nxjijix + ^z)- A similar formula for P^z can also be derived. Then the 
total failure probability of phase error rate estimation, Sph-, is given by 

^ph < Pdx + Pqz ■ (5) 

In a highly non-likely case when Cbx = (ehz = 0), one can replace it by UxCbx = 1 
{n^ebz = 1) to get around the singularity [19]. One can see that ^x{Ox) is positive 
when 9x > and < ebx,ebx + ^x < 1, due to concavity of the binary entropy function 
Hoix). Note that in the limit of a large n, 9 can be chosen small. In this case, Eq. (151) 

nn 

yields a similar result used in the literature, such as Refs. [5|, I13| . 

6. Privacy amplification [authenticated]: Alice generates an {rtx + + / — l)-bit random 
bit string and send to Bob through an authenticated channel. Alice and Bob use this 
random bit string to generate a Toeplitz matrix. The final key (with a size of /) will 
be the product of this matrix (with a size of [ux + Uz) x /) and the key string (with a 
size of Ux + Uz) after passing through the error verification. The failure probability of 
the privacy amplification is given by 

£pa = (n. + n. + /-l)2-'=''"+' + 2"*- , (6) 
where kpa is the secure-key cost for the authentication and toe is defined by 

l = nx[l- H2{ebz + 9z)] 

(7) 

+ nz[l - H2{ebx + 9x)] - toe ■ 

The first term in Eq. ([6]) gives the failure probability of the authentication for the 
{Ux + nz + l — l)-bit random bit string transmission. The second term in Eq. ( 6]) gives 
the failure probability of the privacy amplification given the Toeplitz matrix 281 ]. 



7. The final secure key length (net growth |2Bi]) is given by 

NR > I - 2kbs - kec - kev - kpa (8) 
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with a failure probability of 

e < 2ebs + £ev + £ph + £pa^ (9) 

where / is given by Eq. ([7]). 

One can see that when + Uz ^ 2kbs + k^v + kpa + toe, the final key length given by 
Eq. ([8]) is essentially the same to the one given by the Shor-Preskill's proof 5|. 

III. PARAMETER OPTIMIZATION 

In order to maximize the final secure key length in the post-processing, Alice and Bob need 
to consider the failure probabilities from all steps and the corresponding secure-key costs. 
That is, they need to optimize the key rate, Eq. (IHl), subject to Eq. ([9]). The parameters to 
be optimized are: biased ratio q^^ various secure-key costs {kbs, k^c, kev, kpa, toe) and security 
parameters {sbs, ^ev, £ph, £pa)- 

In practice, Alice and Bob can calibrate the QKD system to get an estimate of the 
transmittance rj, the error rates Cbx and Cbz- Through some rough calculation of the target 
length of the final key, they decide the acceptable confidence interval l — e and fix the length 
of the experiment, A^, the number pulses sent by Alice. Then roughly, the length of the raw 
key is n = Nt]. Thus, in the optimization procedure, the given values (constraints) are e, n, 
ebx and Cbz- 

The failure probability e is chosen by Alice and Bob according to later practical use of 



Id, 



For 



the final key. This relates to the aforementioned composability requirement 
instance, Alice and Bob plan to use the QKD system for a million times, and set e for each 
round. Then the total failure probability for this one-million-round use is lO^e, which should 
be below some threshold depending on the message security level. From here, one can see 
that the choice of e is not strictly pre-determined. That is, the final security parameter, e, 
can slightly deviate from the pre-determined one. 

Denote the probability for Alice and Bob to choose X basis to be p^. After the basis sift, 
Alice and Bob share an n^.-bit (ra^-bit) key in X (Z) basis, where roughly (due to fluctuations) 
Tlx ~ pin and ^ (1— pa;)^n. Thus the biased ratio is given by ~ pI/[pI + i'^—Px)'^]- In a 
realistic case, Alice and Bob can optimize px first, and then optimize other parameters after 
the error verification part when the real values of rix, riz ebx and Cbz are fixed (known). In this 

8 



procedure, the biased ratio cannot be strictly optimized due to fluctuations and calibration 
errors, while other parameters can be well optimized. In the end, they obtain a secure key 
rate and calculate the total failure probability with these parameters. 

The error correction and phase error rate estimation mainly depend on the biased ratio. 
Thus, Alice and Bob can group the failure probabilities and secure key costs into two parts 
by deflning £3 = 2ebs + Sev + Spa and ^3 = 2kbs + k^v + kpa + toe, see Eqs. ([7]), and (Q. 
The flnal secure key length can be rewritten as 

NR > n^[l - fMH^^Cb.) - H2ieb. + 9,)] ^^^^ 

+ njl - f{ebz)H2{ebz) - H2{ebx + 0^)] - k-^. 

We remark that if the contribution from one basis is negative in Eq. (ITOl) . Alice and Bob 
should use the detections from this basis for the parameter estimation only, but not the key 
generation. 



The optimized secure- key cost for each step is given by 19 1 



toe 


_ ks 
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4 1, 
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+ 1 + log2 n 
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toe 


+ 1 + log2(n^ + riz) 
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toe 


+ 1 + log2(n^ + + ^ - 1), 



(11) 



where A = 'n?{nx + nz){nx + + / — 1). The corresponding failure probability is 

£3 = 5Ai/52-('=3-^)/^ (12) 

When the flnal key length is much larger than 37 bits, Alice and Bob can set 

^3 = -5 log2 e + 4 log2 n + 50 (13) 

and the failure probability is £3 < 10~^£:. Since Alice and Bob will recalculate the failure 
probability in the end and allow the flnal e having small deviations from the pre-determined 
value, they can safely use Sph = £ in the optimization. Thus, the simplifled optimization 
problem only has three parameters to be optimized: q^, 0^ and 6*^, given Sph = e — £3 ~ e. 

Observation 1. The main effect of the finite key analysis for the QKD post-processing 
stems from the phase error rate estimation. Inefficiencies due to authentication, bit error 
correction, and privacy amplification are relatively insignificant. 



This can be easily seen from Eqs. (fT2|) and (fT3|) . Even in an extreme case that e = 10~^° 
and n = 10^°, the secure key cost of all the parts other than the phase error rate estimation, 
given by Eq. (fT5]l . is 947 bits (-C n) and its corresponding failure probability £3 < 10^^^. 

IV. AN EXAMPLE 

Now let us consider an example of the post-processing. Suppose N = 10^°, r] = 10~^, 
(then n ^ Nr] = 10^), Cbx = ^bz = 4% and e = 10^'^. It is not hard to see that the final key 
length is much larger than 30 bits. Thus, we can use Eq. f[T^ to calculate the secure-key 
cost, h = 202 bit. 

Now the problem becomes: given n = 10^, Cbx = ^bz = 4% and e = 10^^, optimize the 
parameters: 9^, 9z and g^. Through a numerical program, we get 9^ = 1.07%, 9z = 0.84% and 

= 99.8% (or = 96.0%). Note that, in this case, the bases X and Z are interchangeable 
due to the symmetry. 

With these parameters and Eq. (fTTl) . the final secure key length is 4.41 Mb and its 
corresponding security parameter is e = 1.0095 x 10^^ (very close to the predetermined 
value 10"^). 

In the simulation, we assume the error correction efficiency is 100% (the Shannon limit). 
In this case, the difference between the "asymptotic-key" length (5.15 Mb) and the "finite- 
key" length (4.41 Mb) comes from the finite statistical analysis. Note that all the rest cost is 
only = 259 bit and £3 = 9.5 x 10"^''. This is consistent with Observation [TJ the cost (and 
the failure probability) due to the finite key analysis mainly comes from the phase error rate 
estimation. 

V. FURTHER DISCUSSION 

1. In the privacy amplification step, Alice and Bob need a common matrix to generate 
the final secure key. The current way to construct the matrix is by Alice sending a 
random bit string to Bob, which requires authenticated classical communication. An 
alternative way is by each of them generating a matrix with a pre-shared secret key. 
The main advantage of the second method is that no classical communication is needed 
for the privacy amplification part. In this case, the error verification step can be done 
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before or after the privacy amplification. 

From the LFSR-based Toephtz matrix construction, we know that Toephtz matrices 



24l | . By consuming a kpa-hit secure 



can be generated by a much shorter random string 
key, Ahce and Bob construct a LFSR-based Toephtz matrix with a size of {n^ + riz) x /, 
where I the key length after the privacy amplification. There are two related quantities 
need to be investigated here: the value of / and its corresponding failure probability, 

2. In the security proof, we assume the detection system is compatible with the squashing 
model, where the single-mode assumption is used, and the imperfection of X and Z 



measurements and efficiency mismatch are not considered 



161 ]. It is interesting to 



consider the detector efficiency mismatch with the finite key analysis 17l |. 



3. As noted in Ref. 25|], the finite-key analysis for the decoy-state QKD is a hard prob- 
lem. In the decoy-state QKD, the fiuctuation comes from not only statistics but also 
hardware imperfections. The question is where the main contribution of the fluctu- 
ation comes from and how to quantify these fluctuations. Since QKD systems with 
coherent states are most widely used in experiments, investigating the flnite key effect 
in decoy-state QKD is an important step towards a QKD standard. 

4. In order to compare our flnite-key analysis to others, such as the one by Scarani and 
Renner 1^, one has to make sure the underlying assumptions (deflnitions) are the 
same. Note that in Scarani and Renner's analysis, a trace distance is used as for the 
security deflnition. For example, it is interesting to investigate how to quantify the 
efficiency of authentication with the trace distance. 
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